SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images

نویسندگان

چکیده

Visual question answering on document images that contain textual, visual, and layout information, called VQA, has received much attention recently. Although many datasets have been proposed for developing VQA systems, most of the existing focus understanding content relationships within a single image not across multiple images. In this study, we propose new multi-image dataset, SlideVQA, containing 2.6k+ slide decks composed 52k+ 14.5k questions about deck. SlideVQA requires complex reasoning, including single-hop, multi-hop, numerical also provides annotated arithmetic expressions answers enhancing ability reasoning. Moreover, developed end-to-end model treats evidence selection as unified sequence-to-sequence format. Experiments show our outperformed state-of-the-art QA models, but it still large gap behind human performance. We believe dataset will facilitate research VQA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Question Answering on the SQuAD Dataset

We develop a deep learning framework for question answering on the Stanford Question Answering Dataset (SQuAD), blending ideas from existing state-of-theart models to achieve results that surpass the original logistic regression baselines. Using a dynamic coattention encoder and an LSTM decoder, we achieved an F1 score of 55.9% on the hidden SQuAD test set. In this paper, we present the methodo...

متن کامل

SQuAD Question Answering Dataset: CS224N Assn 4

We solve the contextual question answering problem, which is an essential part in many automated question-answering datasets. Recently the SQuAD dataset [1] was uploaded and there were several deep learning approaches proposed to solve this. We implement a modified version of one of them, the Dynamic Coattention model as well as simple baseline.

متن کامل

Solving the Prerequisites: Improving Question Answering on the bAbI Dataset

The aim of this project is to make progress towards building a machine learning agent that understands natural language and can perform basic reasoning. Towards this nebulous goal, we focus on question answering: Can an agent answer a query based on a given set of natural language facts? We combine LSTM sentence embedding models with an attention mechanism and obtain good results on the Faceboo...

متن کامل

Toward a Document Model for Question Answering Systems

The problem of acquiring valuable information from the large amounts available today in electronic media requires automated mechanisms more natural and efficient than those already existing. The trend in the evolution of information retrieval systems goes toward systems capable of answering specific questions formulated by the user in her/his language. The expected answers from such systems are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i11.26598